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Abstract-  This  paper  describes  an  approach  to  continu¬ 
ous  coevolution  of  form  (the  morphology)  and  function 
(the  control  behavior)  for  autonomous  vehicles.  This 
study  focuses  on  coevolution  of  the  characteristics  such 
as  beam  width  and  range  of  individual  sensors  in  the 
sensor  suite,  and  the  reactive  strategies  for  collision-free 
navigation  for  an  autonomous  micro  air  vehicle.  The  re¬ 
sults  of  the  evolution  of  the  system  in  a  fixed  simulation 
model  were  compared  to  case-based  anytime  learning 
(also  called  continuous  and  embedded  learning)  where 
the  simulation  model  was  updated  over  time  to  better 
match  changes  in  the  environment. 

1  Introduction 

Autonomous  vehicles  that  can  change  their  own  morphol¬ 
ogy  on  the  fly  are  highly  desirable  in  many  domains.  For 
example,  the  ability  of  an  air  vehicle  to  modify  its  air  frame 
and  the  configuration  of  its  control  surfaces  during  cer¬ 
tain  stages  of  the  flight,  such  as  take  offs,  attacks,  or  land¬ 
ings,  would  have  a  direct  impact  on  the  system’s  efficiency, 
performance,  and  safety.  This  shape-shifting  or  morphing 
mechanism  would  also  be  desirable  in  an  Urban  Search  and 
Rescue  robot  to  enhance  its  ability  to  traverse  difficult  inter¬ 
nal  structures  within  collapsed  buildings. 

Evolutionary  algorithms  have  been  successfully  applied 
to  automate  the  design  of  robots’  morphology,  the  design 
of  the  controllers,  and  more  recently  to  coevolution  of  form 
and  function.  It  is  our  belief  that  the  natural  process  of  co¬ 
evolving  the  form  and  function  of  living  organisms  can  be 
applied  to  the  design  of  morphology  and  control  behaviors 
of  autonomous  vehicles  in  order  to  simplify  the  design  pro¬ 
cess  and  improve  the  performance  of  the  system.  In  our 
work,  coevolution  of  form  and  function  has  been  applied 
to  the  micro  air  vehicle  (MAV)  domain.  The  design  of  the 
sensory  payload  and  the  controller  for  an  MAV  is  compli¬ 
cated  by  the  size  of  the  vehicle  (wingspan  on  the  order  of 
6  inches),  its  limited  payload,  and  a  great  variety  of  possi¬ 
ble  applications.  The  design  issue  addressed  explicitly  in 
this  study  is  minimization  of  power  requirements.  It  is  as¬ 
sumed  that  power  efficiency  is  inversely  proportional  to  the 
coverage  of  the  sensor  suite.  The  work  presented  here  is  an 
extension  of  the  research  published  in  [Bugajska  2000]  and 
[Bugajska  2002], 

In  addition,  an  important  problem  arising  for  all  au¬ 
tonomous  vehicles  that  are  expected  to  perform  tasks  for 
extended  periods  is  how  to  adapt  the  components  of  the 
system  in  response  to  unexpected  changes  in  the  environ¬ 
ment  or  in  their  own  capabilities  in  close  to  real  time.  Con¬ 


tinuous  and  embedded  learning  (also  called  anytime  learn¬ 
ing)  [Grefenstette  1992]  is  a  general  approach  to  continuous 
learning  in  changing  environments.  The  vehicle’s  learning 
module  continuously  tests  new  strategies  in  an  embedded 
simulation  model  which  is  updated  in  response  to  changes 
in  the  environment.  In  the  past,  this  approach  has  been  suc¬ 
cessfully  applied  to  learning  and  adaptation  of  robotic  be¬ 
haviors  in  dynamic  environments  as  well  as  in  situations 
where  the  robot  experiences  sensor  failures.  This  study  fo¬ 
cuses  specifically  on  the  continuous  coevolution  of  a  mini¬ 
mal  sensor  suite,  which  allows  for  most  efficient  collision- 
free  navigation,  in  a  changing  environment.  The  approaches 
to  evolution  in  a  simulation  without  feedback  from  the  task 
environment,  are  compared  to  case-based  continuous  and 
embedded  learning  [Ramsey  1994]  in  a  simulation  where 
such  feedback  exists. 

The  remainder  of  this  paper  briefly  outlines  the  related 
work  and  then  continues  with  a  description  of  our  imple¬ 
mentation  of  coevolution  of  the  characteristics  of  a  sensor 
suite  and  collision-free  navigation  of  an  MAV.  The  simu¬ 
lated  environment,  aircraft,  and  sensors  are  described  along 
with  the  details  of  the  learning  system.  Finally,  the  initial 
results  of  the  learning  experiments  in  a  changing  environ¬ 
ment  are  discussed,  and  the  future  direction  of  the  research 
is  outlined. 

2  Coevolution  of  Form  and  Function 

In  recent  years,  the  result  of  the  evolution  of  behav¬ 
iors  for  autonomous  agents  in  simulation  ([Nolfi  1994, 
Harvey  1992,  Schultz  1996,  Potter  2001])  and  real  world 
([Floreano  1996]),  and  research  in  automation  of  structural 
design  ([Husbands  1996,  Funes  1997,  Lichtensteiger  1999, 
Lund  1997,  Mark  1998]),  has  lead  researchers  to  explore 
the  concept  of  coevolution  of  form  and  function  for  au¬ 
tonomous  agents.  [Cliff  1993]  and  [Cliff  1996]  present 
research  on  concurrent  evolution  of  neural  network  con¬ 
trollers  and  visual  sensor  morphologies,  for  visually  guided 
tracking.  [Sims  1994]  presents  a  system  for  the  coevolu¬ 
tion  of  morphology  and  behavior  of  virtual  creatures  that 
compete  in  a  physically  simulated  three-dimensional  world. 
Similar  work  is  presented  in  [Hornby  2001]  where  the  body 
and  brain  of  the  creatures  are  evolved  using  Lindenmayer 
systems  as  generative  encoding.  In  [Lee  1996]  a  hybrid  ge¬ 
netic  programming/genetic  algorithm  approach  is  presented 
that  allows  for  evolution  of  both  controllers  and  robot  bod¬ 
ies  to  achieve  behavior-specified  tasks.  [Balakrishnan  1996] 
presents  the  comparative  study  of  evolution  of  a  control  sys¬ 
tem  given  a  fixed  sensor  suite,  and  coevolution  of  sensor 
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characteristics  (placement  and  range)  and  the  control  ar¬ 
chitecture  for  the  task  of  box  pushing.  In  previous  work 
[Bugajska  2000]  and  [Bugajska  2002],  we  explored  coevo¬ 
lution  of  the  beam  width  of  the  individual  sensors  in  the 
sensor  suite  and  the  collision-free  navigation  behavior  in 
context  of  different  controller  representations  and  coevolu¬ 
tion  approaches  in  micro  air  vehicles.  This  study  extends 
the  previous  work  by  exploring  the  coevolution  of  form 
and  function  in  the  context  of  changing  environments;  we 
combine  the  coevolution  of  form  and  function  with  anytime 
learning  technique.  In  addition,  this  study  extends  our  pre¬ 
vious  work  by  evolving  the  sensing  range  of  the  individual 
sensors  in  addition  to  the  beam  width. 

2.1  Representation 

In  this  study,  each  individual  (chromosome)  in  the  popula¬ 
tion,  contains  the  genetic  material  describing  the  informa¬ 
tion  of  both  the  morphology  and  the  control  behavior  of 
the  autonomous  agent.  The  characteristics  of  the  sensor 
suite  are  encoded  in  a  floating-point  vector  with  elements 
for  beam  width  and  the  range  of  individual  sensors  in  the 
suite  (Section  5.1).  The  collision-free  navigation  behavior  is 
represented  as  a  set  of  stimulus-response  rules  (Section  4.1). 

2.2  Environment 

A  high-fidelity,  3-D  flight  simulator  (Fig.  1),  which  includes 
an  accurate  parameterized  model  of  a  6-inch  MAV,  has  been 
used  to  model  the  environment  and  the  vehicle.  The  simula¬ 
tion  allows  the  user  to  control  the  aircraft  by  specifying  only 
the  turn  rate  values;  the  speed  and  altitude  of  the  plane  are 
adjusted  appropriately  by  low-level  PID  controllers.  In  this 
study,  the  MAV  is  controlled  by  specifying  discreet  turning 
rates  between  20  and  20  degrees  in  5-degree  increments. 

The  trees  (obstacles)  are  modeled  as  spheres  on  top  of 
cylinders.  Any  contact  between  the  plane  and  the  tree  con¬ 
stitutes  a  collision.  The  density  of  trees  is  user-defined  as 
the  number  of  trees  per  square  foot  assuming  uniform  distri¬ 
bution  and  varied  from  1.25  to  5.0  trees  per  hundred  square 
feet.  At  the  beginning  of  each  simulated  flight,  the  MAV 
is  placed  at  a  random  location  within  a  specified  area  away 
from  the  target.  The  target  is  stationary  and  reachable  dur¬ 
ing  every  trial. 

The  simulated  MAV  has  a  sensor,  which  returns  the  rela¬ 
tive  range  and  bearing  to  the  target.  It  is  also  equipped  with 
an  array  of  range  sensors  positioned  symmetrically  along 
the  direction  of  flight  and  radially  from  the  center  of  the 
vehicle.  Each  sensor  is  capable  of  detecting  obstacles  and 
returning  the  range  to  the  closest  object  within  its  field  of 
view.  The  beam  width  and  the  range  of  the  individual  range 
sensors  are  evolved  along  with  control  behavior. 

2.3  Fitness  Function 

The  morphology  of  the  sensor  suite  and  the  control  behavior 
of  the  MAV  are  evolved  in  simulation.  During  each  evalu¬ 
ation,  a  number  of  episodes  is  performed  that  begins  with 
placement  of  the  MAV  at  a  random  distance  away  from 
the  target  facing  in  a  random  direction,  which  is  followed 


Figure  1:  The  screenshot  of  the  3-D  simulated  environment 
used  for  the  experiments.  The  white  sphere  marks  the  target 
and  dark  gray  (green)  spheres  with  light  gray  cylinders  mark 
the  obstacles  (trees). 

by  a  random  placement  of  trees  in  the  environment.  The 
episodes  end  with  either  a  successful  arrival  of  the  MAV  at 
the  target  location,  a  loss  of  the  MAV  due  to  energy/time 
running  out,  or  a  loss  of  the  MAV  due  to  collision  with  an 
obstacle.  The  fitness  of  the  individual  is  based  on  the  qual¬ 
ity  of  the  sensor  suite  and  execution  of  the  task  and  as  in  our 
previous  work  is  defined  as  follows: 

if( reached  the  goal) 
payoff  is  based  on 

the  distance  MAV  traveled  (Section  4.3) 

PLUS 

the  quality  of  the  sensor  suite  (Section  5.3) 
else  if  (collision  or  time  out  occured) 
payoff  based  on 

the  distance  away  from  target  (Section  4.3) 

It  should  be  noted  that  the  contribution  due  to  the  quality 
of  the  sensor  suite  is  considered  only  once  the  task  perfor¬ 
mance  is  satisfactory  and  that  payoff  is  only  assigned  at  the 
end  of  the  episode. 

3  Continuous  and  Embedded  Learning 

The  main  focus  of  this  study  is  coevolution  of  form  and 
function  for  extended  periods  in  changing  environments. 
Continuous  and  embedded  learning  ([Grefenstette  1992, 
Ramsey  1994])  is  a  general  approach  to  continuous  learn¬ 
ing  in  a  changing  environment.  As  shown  in  Figure  2,  the 
agents  learning  module  continuously  tests  new  strategies 
against  a  simulation  model  of  the  environment,  and  dynam¬ 
ically  updates  the  knowledge  base  (behaviors)  used  by  the 
agent.  The  execution  module  controls  the  agents  interac¬ 
tion  with  the  environment,  and  includes  a  monitor  process 
that  can  dynamically  modify  the  simulation  model  based  on 
its  observation  of  the  environment.  When  the  simulation 
model  is  modified,  the  learning  process  continues  with  the 
modified  model.  The  learning  system  is  assumed  to  oper¬ 
ate  indefinitely,  and  the  execution  system  uses  the  results 


Figure  2:  The  anytime  (also  known  as  continuous  and  em¬ 
bedded)  learning  model. 

of  learning  as  they  become  available.  This  learning  ap¬ 
proach  was  previously  used  to  continuously  evolve  tracking 
behaviors  ([Grefenstette  1992])  and  door  traversing  behav¬ 
ior  ([Schultz  2000])  in  face  of  changing  environment  and 
changes  in  the  agent’s  own  capabilities  such  as  sensor  fail¬ 
ures. 

In  this  instantiation  of  anytime  learning,  the  only  mea¬ 
surable  aspect  of  the  environment  is  the  density  of  the  ob¬ 
stacles  (trees).  When  the  monitor  detects  the  change,  the 
environment  model  is  updated  and  the  learning  system  is 
re-initialized.  Currently,  the  system  is  re-initialized  using  a 
combination  of  previously  evolved  strategies  chosen  based 
on  their  fitness  and  the  similarity  of  the  model  under  which 
they  were  evolved,  and  a  simple  default  strategy. 

4  Evolution  of  Function 

The  performance  of  the  system  is  determined  by  the  agent’s 
ability  to  perform  the  task.  In  our  study,  the  MAV  must 
be  able  to  efficiently  and  safely  navigate  among  obstacles 
(trees)  to  a  target  location.  The  desired  behavior  should 
maximize  the  number  of  times  the  MAV  reaches  the  target 
location  while  minimizing  the  distance  traveled  to  that  lo¬ 
cation.  Every  single  evaluation  is  performed  in  a  randomly 
created  environment  (random  MAV  position  and  orienta¬ 
tion,  random,  but  uniform  tree  placement,  etc.)  with  com¬ 
plexity  determined  by  the  tree  density. 

4.1  Problem  Representation 

In  this  study,  the  collision-free  navigation  behavior  is  im¬ 
plemented  as  a  collection  of  stimulus-response  rules  (see 
[Bugajska  2002]  for  alternative  approach).  Each  stimulus- 
response  rule  consists  of  conditions  that  match  against  the 
current  sensors  of  the  agent,  and  an  action  that  suggests  ac¬ 
tion  to  be  performed  by  it.  For  example,  a  rule  (gene),  which 
states  that  if  there  is  an  obstacle  fairly  close  and  roughly 
ahead  of  the  vehicle,  even  when  the  goal  is  also  ahead  of  it, 
the  vehicle  should  turn  left,  could  be  represented  as: 

RULE  122 

IF  sonar4  <  45  AND  bearing  =  [-20,  20] 
THEN  SET  turn_rate  =  -100 


Each  rule  has  an  associated  strength  with  it  as  well  as  a  num¬ 
ber  of  other  statistics.  During  each  decision  cycle,  all  the 
rules  that  match  the  current  state  are  identified.  Conflicts 
are  resolved  in  favor  of  rules  with  higher  strength.  Rule 
strengths  are  updated  based  on  rewards  received  after  each 
training  episode.  The  following  stimuli  were  defined: 

•  range  1  ..  range9 :  Value  between  0.5  and  20  feet  in 
1-foot  increments,  which  specifies  the  distance  to  the 
closest  obstacle  within  sensors  field  of  view. 

•  range :  Value  between  0  and  800  feet  in  1-foot  incre¬ 
ments,  which  specifies  the  distance  to  the  target. 

•  bearing :  Value  between  -180  and  180  degrees  in  20- 
degree  increments,  which  specifies  the  bearing  to  the 
target. 

The  action  parameter,  turn_rate,  specified  the  turn  rate 
for  the  MAV  and  took  on  values  between  -20  and  20  degrees 
in  5 -degree  increments. 

4.2  Learning  Method 

The  system  must  learn  a  collision-free  navigation  behavior. 
In  this  study,  the  behaviors  are  evolved  using  the  SAMUEL 
rule  learning  system.  SAMUEL  uses  standard  genetic  al¬ 
gorithms  and  other  competition-based  heuristics  to  solve 
sequential  decision  problems.  It  features  Lamarckian  op¬ 
erators  (specialization,  generalization,  merging,  avoidance, 
and  deletion)  that  modify  rules  based  on  interaction  with 
the  environment.  SAMUEL  has  to  perform  a  number  of 
evaluations  (on  the  order  of  80  in  the  current  study)  in  or¬ 
der  to  provide  history  for  the  Lamarckian  operators,  to  co¬ 
alesce  rule  strengths,  and  to  account  for  the  noise  in  the 
evaluations.  The  original  system  implementation  and  de¬ 
fault  learning  parameters  pertinent  to  evolution  of  rule  sets 
are  described  in  greater  detail  in  [Grefenstette  1990]  and 
[Grefenstette  1991], 

4.3  Fitness  Function  Contribution 

The  fitness  of  the  controller  is  proportional  to  the  distance 
the  MAV  traveled  during  successful  trials  when  the  goal  lo¬ 
cation  is  reached,  or  the  minimum  distance  away  from  the 
target  during  an  unsuccessful  trial  when  the  agent  crashed  or 
ran  out  of  time,  and  contributes  either  [0.0-0. 3]  or  [0.5-0. 8] 
respectively,  to  the  global  fitness  functions.  The  contribu¬ 
tion  is  calculated  as  follows: 

!0.3  *  ^1.0  —  ,  unsuccess f  ul  trial 

0.5  +  0.3  *  >  successf  ul  trial 

where  Da  is  the  minimum  distance  away  from  the  target 
during  the  trial,  Ds  is  an  initial  distance  away  from  the  tar¬ 
get,  and  Dt  is  total  distance  traveled  during  the  trial. 

5  Evolution  of  Form 

The  behavior  an  agent  adopts  for  a  task  is  determined  by 
its  ability  to  interact  and  sense  the  environment.  There  are 


a  wide  variety  of  sensors  that  could  be  implemented  on  the 
MAV,  but  the  final  make  up  of  the  sensor  suite  is  constrained 
by  the  size,  weight,  and  power  capacity  of  the  vehicle.  The 
objective  of  this  study  is  to  evolve  the  most  power-efficient 
sensor  suite  that  guarantees  an  efficient  task-specific  con¬ 
trol.  Power  efficiency  is  assumed  for  this  study  to  be  in¬ 
versely  proportional  to  sensing  ability  of  the  agent  deter¬ 
mined  by  its  sensor  suite  coverage. 

5.1  Problem  Representation 

Our  range  sensor  model  is  based  on  a  simple  range  sensor 
such  as  sonar  or  radar.  It  returns  range  to  a  single,  closest 
obstacle  in  its  field  of  view.  The  possible  evolvable  sensor 
characteristics  include: 

•  range  of  the  individual  sensor; 

•  beam  width  of  the  individual  sensor; 

•  placement  of  individual  sensor  on  the  vehicle. 

In  this  study,  the  beam  width  and  the  range  of  each  of  the 
available  sensors  are  being  evolved.  The  number  of  sensors 
is  evolved  implicitly  since  values  of  beam  width  or  range 
equal  to  zero  imply  that  the  sensor  isn’t  used.  Nine  sensors 
are  placed  symmetrically  along  the  direction  of  flight  and 
radially  from  the  center  of  the  vehicle  in  increments  of  22.5 
degrees.  To  decrease  the  search  space,  the  symmetry  along 
the  forward  axis  is  exploited  and  only  the  forward  and  four 
sensors  along  one  side  are  represented.  The  four  sensors 
along  the  other  side  of  the  vehicle  are  identical  to  the  first 
four.  The  maximum  beam  width  of  the  sensor  is  45  degrees 
while  the  maximum  sensing  range  is  20.0  feet. 

The  sensor  suite  characteristics  are  represented  as  a  vec¬ 
tor  of  ten  values:  the  beam  width  and  the  range  for  five 
unique  sensors,  each  represented  by  a  floating-point  value 
between  0.0  and  1 .0.  For  each  sensor,  the  first  gene  value  is 
mapped  to  0  to  45  degrees  that  defines  its  beam  width  and 
the  second  value  is  mapped  to  0  -  20  feet  that  defines  its 
sensing  range. 

5.2  The  Learning  Method 

The  sensor  suite  characteristics  are  also  evolved  using 
SAMUEL.  In  addition  to  the  rule  set  representation, 
SAMUEL  allows  a  set  of  parameters  to  be  attached  to 
each  of  the  rule  sets,  which  we  use  as  described  above  to 
represent  the  sensor  characteristics.  On  these  parameters, 
SAMUEL  uses  Gaussian  mutation  (mu  =  0  and  sigma  = 
0.15)  and  two-point  crossover.  It  uses  a  fitness-proportional 
selection  method  to  choose  the  individuals  out  of  the  popu¬ 
lation  -  the  number  of  offspring  is  proportional  to  the  par¬ 
ents  fitness. 

5.3  Fitness  Function  Contribution 

The  fitness  of  the  sensor  suite  is  inversely  proportional  to 
its  coverage  and  contributes  [0.0  ..  0.2]  to  the  global  fit¬ 
ness  functions,  but  only  if  the  agent  behavior  allows  it  to 
complete  the  task,  i.e.  navigate  safely  to  the  target  location. 
The  coverage  of  the  sensor  suite  is  calculated  as  the  sum  of 
the  areas  of  the  sectors  defined  by  the  beam  width  and  the 


range  of  individual  sensors.  The  contribution  is  calculated 
as  follows: 

Jform(x)  =  0.2  *  f  1.0  -  ^  ^  ^ 

V  L max J 

where  C(x)  is  the  coverage  of  the  sensor  suite  and  Cm  ax  is 
the  maximum  possible  sensor  coverage  for  the  experiment; 
Cmax  is  currently  equal  to  1413.0  square  feet. 

6  Experimental  Design 

Similar  to  [Grefenstette  1992],  we  compared  traditional 
evolution  in  a  simulated  environment  with  no  feedback  from 
the  task  environment,  to  case-based  continuous  and  embed¬ 
ded  learning  in  a  simulated  environment  which  reflected 
current  state  of  the  world.  These  approaches  can  be  viewed 
as  alternative  approaches  to  system  development;  in  first 
case,  the  learning  is  done  offline  in  a  simulation  designed 
by  the  experts  while  in  the  second  case,  the  learning  is 
performed  online  after  the  system  has  been  deployed.  We 
performed  three  separate  experiments;  two  baseline  exper¬ 
iments  which  explored  evolution  in  static  simulation  en¬ 
vironment,  and  one  which  applied  anytime  coevolution  of 
form  and  function  technique  to  a  dynamic  simulation  envi¬ 
ronment.  The  total  length  of  the  experiment  was  450  gener¬ 
ations  with  100  members  in  the  population.  The  complexity 
of  the  environment  was  changed  every  25  generations. 

6.1  Experiment  1:  Fixed  complexity  simulation  model 

In  this  experiment,  all  possible  solutions  throughout  the 
length  of  the  experiment  were  evaluated  in  a  series  of  sim¬ 
ulated  environments  with  the  same,  constant  environment 
complexity  independent  of  the  changing  environment.  The 
tree  density,  which  determines  the  complexity  of  the  envi¬ 
ronment,  was  set  to  2.5  trees  per  100  square  feet,  which  was 
previously  determined  to  provide  an  adequate  learning  gra¬ 
dient  and  acceptable  level  of  generalization  to  other  densi¬ 
ties.  Whenever  the  learning  system  found  a  solution  which 
outperformed  the  previous  one  in  the  simulation,  the  online 
strategy  was  updated.  The  changes  in  the  environment  were 
not  registered  in  the  simulation  and  the  learning  continued 
uninterrupted  throughout  the  whole  experiment. 

6.2  Experiment  2:  Sampled  complexity  simulation 
model 

Similarly  to  the  first  baseline  experiment,  in  this  experi¬ 
ment,  all  the  individuals  were  evaluated  in  a  series  of  sim¬ 
ulated  environments  of  varied  complexity  independent  of 
the  changing  environment.  The  tree  density  of  the  environ¬ 
ment  was  chosen  at  random  from  uniform  distribution  of 
three  densities,  1.25,  2.5,  and  5  trees  per  100  square  feet. 
Whenever  the  learning  system  found  a  solution,  which  out¬ 
performed  the  previous  one  in  the  simulation,  the  online 
strategy  was  updated.  To  establish  the  baseline,  the  changes 
in  the  environment  were  not  registered  in  the  simulation  and 
the  learning  continued  uninterrupted  through  out  the  whole 
experiment. 
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Figure  3:  Summary  of  task  performance  in  a  changing  en¬ 
vironment. 

6.3  Experiment  3:  Dynamic  simulation  model 

In  this  experiment,  the  individuals  of  the  current  generation 
were  evaluated  in  a  series  of  simulated  environments  with 
complexity  determined  by  the  current,  changing  environ¬ 
ment.  The  tree  density  of  the  environments  varied  between 
the  same  densities  as  in  the  previous  model:  1.25,  2.5,  and 
5  trees  per  100  square  feet.  Each  density  was  recognized 
as  a  separate  case.  For  the  first  3  periods  (25  generations 
each),  the  cases  were  presented  in  increasing  order  of  com¬ 
plexity.  For  the  rest  of  the  experiment,  the  complexity  of  the 
environment  within  each  block  of  three  cases  was  selected 
at  random.  Each  case  was  presented  a  total  of  six  times. 
For  this  study,  the  environments  were  presented  in  the  fol¬ 
lowing  order:  F  (1.25),  M  (2.5),  H  (5.0),  H,  F,  M,  F,  H, 
M,  F,  M,  H,  M,  F,  H,  F,  H,  M.  Whenever  the  learning  sys¬ 
tem  found  a  solution  which  outperformed  the  previous  one 
in  the  simulation,  the  online  strategy  was  updated.  When 
the  change  in  the  environment  was  detected,  the  simulation 
was  updated  and  the  offline  learning  was  reinitialized  ac¬ 
cording  to  case-base  anytime  learning  strategy.  On  the  first 
occurrence  of  the  case,  the  population  was  initialized  using 
a  homogenous,  simple  default  set  of  rules.  The  subsequent 
times,  one  half  of  the  initial  population  was  initialized  based 
on  a  similarity  metric  between  the  current  case  and  the  pre¬ 
viously  observed  cases  in  the  case  base,  while  the  other  half 
of  the  population  was  initialized  using  a  default  rule  set.  In 
this  study,  the  similarity  metric  was  simply  defined  as  abso¬ 
lute  difference  in  tree  density  of  the  environment. 

7  Results 

The  results  of  anytime  coevolution  of  form  and  function  in  a 
changing  environment  for  each  of  the  approaches  described 
in  Section  6  are  summarized  in  Figures  3  through  6  and  in 
Table  1. 
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Figure  4:  Task  performance  in  the  low  complexity  (1.25 
trees  per  100  sq.  ft)  environment. 

Figures  3  through  6  show  online  performance  of  the  best 
individuals  for  each  approach.  Each  data  point  in  the  graphs 
represents  the  average  performance  of  a  best-so-far  individ¬ 
ual  over  100  episodes.  The  data  was  averaged  over  3  inde¬ 
pendent  sets  of  runs  for  each  of  the  baseline  and  the  anytime 
learning  experiments.  In  this  study,  the  performance  is  de¬ 
fined  as  the  number  of  times  the  MAV  reached  the  goal  out 
of  a  hundred.  Figure  3  summarizes  online  performance  of 
the  system  in  the  changing  environment.  The  vertical  lines 
in  the  plot  mark  the  environment  changes.  The  complex¬ 
ity  of  the  environment  for  each  period  is  provided  along  the 
horizontal  axis;  F  indicates  the  lowest,  M  the  medium,  and 
H  the  highest  tree  density.  Figures  4  through  6  present  each 
level  of  the  environment  complexity  individually  with  all 
the  relevant  periods  concatenated. 

The  case-base  continuous  and  embedded  learning  was 
able  to  outperform  both  alternative  approaches.  It  is  also 
worth  noting  that  even  though  the  simulation  models  were 
not  updated  during  learning  in  Experiments  1  and  2,  the 
evolved  strategies  were  to  a  certain  degree  tolerant  to  the 
changes  of  the  environment.  Further,  the  strategies  evolved 
in  a  simulation  with  a  fixed  complexity  were  more  general 
than  the  ones  evolved  in  a  simulation  which  sampled  the 
complexity  space. 

Table  1  summarizes  the  characteristics  of  the  final  sensor 
suites  for  each  approach.  The  data  was  averaged  over  3  in¬ 
dependent  sets  of  runs  for  each  of  the  baseline  and  the  any¬ 
time  learning  experiments.  The  beam  width  and  the  range 
of  the  five  unique  sensors  and  the  total  coverage  of  the  sen¬ 
sor  suite  are  presented.  The  goal  of  the  evolution  of  form 
was  to  evolve  a  sensor  suite  with  minimal  coverage  in  or¬ 
der  to  maximize  power  efficiency  of  the  vehicle  which  was 
defined  to  be  inversely  proportional  to  the  sensor  coverage. 

By  design,  anytime  learning  approach  allowed  for  higher 
level  of  specialization  of  sensors  suites  for  individual  cases, 
but  it  was  even  able  to  improve  on  the  sensor  suite  evolved 
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Range 

Width 
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Width 

Range 

Fixed 

0.0 

0.0 

12.5 

1.7 

14.8 

6.5 

25.0 

6.6 

5.4 

14.5 

106.5 

Sampled 

15.8 

10.8 

7.5 

3.7 

3.2 

5.7 

29.6 

11.8 

2.2 

10.4 

108.4 

Anytime  (L) 

23.2 

9.8 

14.5 

10.4 

16.2 

12.2 

30.9 

6.1 

5.1 

13.7 

108.6 

Anytime  (M) 

17.0 

10.3 

16.1 

8.9 

27.5 

4.3 

11.9 

4.4 

3.6 

11.9 

82.1 

Anytime  (H) 

14.0 

7.7 

18.1 

8.7 

12.2 

6.5 

9.7 

1.7 

2.9 

14.1 

70.8 

Table  1 :  Characteristics  of  the  sensor  suites  using  traditional  evolution  in  a  fixed  and  sampled  simulation  environments,  and 
using  case-based  anytime  learning  in  dynamic  simulation  environment.  Beam  width  and  the  range  of  the  unique  sensors 
and  the  total  coverage  of  the  sensor  suites  are  presented. 


Figure  5:  Task  performance  in  the  medium  complexity  (2.5 
trees  per  100  sq.  ft)  environment. 

in  the  static  medium  complexity  environment.  In  general, 
the  sensor  suites  evolved  for  the  low  density  environment 
did  not  require  full  set  of  sensors  and  all  sets  included  a  nar¬ 
row,  far  reaching  front  sensor,  and  several  shorter  side  sen¬ 
sors.  The  higher  density  environments  required  more  uni¬ 
form  distribution  of  sensing  coverage  between  all  available 
sensors. 

These  results  show  that  anytime  learning  is  a  feasible 
approach  to  continuous  coevolution  of  form  and  function. 

8  Conclusions 

In  this  paper,  we  presented  an  approach  to  continuous  and 
embedded  coevolution  of  form  (the  morphology)  and  func¬ 
tion  (the  control  behavior)  for  autonomous  vehicles.  While 
this  study  focused  only  on  coevolution  of  the  characteris¬ 
tics  such  as  beam  width  and  range  of  individual  sensors  in 
the  sensor  suite,  and  the  reactive  strategies  for  collision-free 
navigation  for  an  autonomous  micro  air  vehicle,  this  ap¬ 
proach  could  be  easily  extended  to  evolution  of  more  com¬ 
plete  morphologies  for  more  complex  missions.  The  ad¬ 


Figure  6:  Task  performance  in  the  high  complexity  (5.0 
trees  per  100  sq.  ft)  environment. 

dition  of  an  anytime  (continuous  and  embedded  learning) 
mechanism  allows  for  more  robust  and  adaptive  systems.  In 
particularly,  it  opens  the  door  for  vehicles  that  can  morph, 
that  is,  change  their  configuration  on  the  fly  for  different 
aspects  of  a  mission  or  to  handle  unexpected  situations. 

Experimental  results  were  presented  which  showed  that 
continuous  and  embedded  learning  is  a  feasible  approach  to 
anytime  coevolution  of  form  and  function.  Further  experi¬ 
ments  will  be  performed  to  determine  appropriate  anytime 
learning  components  for  the  domain  such  as  re-initialization 
policies  or  minimum  case  presence.  We  plan  to  extend  this 
work  to  learn  characteristics  of  an  air  vehicle’s  airframe  that 
might  be  changed  during  a  mission,  such  as  the  length  of  the 
tail  structure,  and  the  shape  and  geometry  of  the  airfoils. 
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